Skip to content

Comments

create initial benchmark#459

Open
evisdren wants to merge 1 commit intomainfrom
optimize_claude_start
Open

create initial benchmark#459
evisdren wants to merge 1 commit intomainfrom
optimize_claude_start

Conversation

@evisdren
Copy link
Contributor

Entire-Checkpoint: 4157af44db5f

Entire-Checkpoint: 4157af44db5f
Copilot AI review requested due to automatic review settings February 22, 2026 21:53
@evisdren evisdren requested a review from a team as a code owner February 22, 2026 21:53
@cursor
Copy link

cursor bot commented Feb 22, 2026

PR Summary

Low Risk
Adds benchmark-only helpers and an integration benchmark; no production logic or data handling paths are modified.

Overview
Adds an integration benchmark that measures end-to-end latency of the hooks claude-code session-start subprocess, with sub-benchmarks that independently scale session state file count, git ref/branch count, tracked repo file count, and commit history depth.

Extends benchutil.BenchRepo with helpers to seed many branch refs and run git pack-refs --all, enabling more realistic ref-scanning performance testing.

Written by Cursor Bugbot for commit 8a0416e. Configure here.

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

Comment @cursor review or bugbot run to trigger another review on this PR

}

b.ReportMetric(float64(time.Since(start).Milliseconds()), "ms/op")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ReportMetric in loop only keeps last iteration's value

Medium Severity

b.ReportMetric overwrites any previously reported value for the same unit, so calling it inside the for range b.N loop means only the last iteration's latency is reported as ms/op, discarding all earlier measurements. The total time needs to be accumulated across all iterations and divided by b.N after the loop ends, then reported once with a single b.ReportMetric call. With -benchtime=5x, 4 out of 5 samples are silently thrown away.

Fix in Cursor Fix in Web

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds initial benchmarking infrastructure to measure the performance of the entire hooks claude-code session-start command under various scaling conditions. The benchmarks help identify performance bottlenecks as repositories grow in size, number of sessions, refs, and commit history.

Changes:

  • Added integration benchmark suite for hook session-start latency measurements
  • Extended benchutil with helpers to seed branches and pack refs for realistic testing scenarios

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
cmd/entire/cli/integration_test/hook_bench_test.go New benchmark suite testing session-start hook performance across four scaling dimensions: session count, ref count, repo files, and commit history
cmd/entire/cli/benchutil/benchutil.go Added SeedBranches and PackRefs helper methods to simulate repos with many git refs for benchmark testing

claudeProjectDir := b.TempDir()

b.ResetTimer()
for range b.N {
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This benchmark uses the older for range b.N pattern instead of the newer b.Loop() pattern that was introduced in Go 1.24. The codebase is using Go 1.25.6 and all other benchmarks in benchutil_test.go consistently use b.Loop().

While for range b.N still works, using b.Loop() is the modern approach and provides better ergonomics. Consider updating this to match the pattern used throughout the codebase for consistency.

Copilot uses AI. Check for mistakes.
Comment on lines +142 to +156
start := time.Now()

cmd := exec.Command(binary, "hooks", "claude-code", "session-start")
cmd.Dir = repo.Dir
cmd.Stdin = bytes.NewReader(stdinPayload)
cmd.Env = append(os.Environ(),
"ENTIRE_TEST_CLAUDE_PROJECT_DIR="+claudeProjectDir,
)

output, err := cmd.CombinedOutput()
if err != nil {
b.Fatalf("session-start hook failed: %v\nOutput: %s", err, output)
}

b.ReportMetric(float64(time.Since(start).Milliseconds()), "ms/op")
Copy link

Copilot AI Feb 22, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using manual timing with time.Now() and b.ReportMetric alongside b.ResetTimer() creates competing timing mechanisms. When using b.Loop(), the testing framework automatically handles timing, eliminating the need for manual time.Now()/time.Since() calls and b.ReportMetric().

If you switch to b.Loop() as suggested, you can remove the manual timing code (lines 142, 156) and let the framework handle it automatically. The framework's timing is more accurate as it accounts for framework overhead.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

1 participant